The utility of the surprise question: A useful tool for identifying patients nearing the last phase of life? A systematic review and meta-analysis

Background: The surprise question is widely used to identify patients nearing the last phase of life. Potential differences in accuracy between timeframe, patient subgroups and type of healthcare professionals answering the surprise question have been suggested. Recent studies might give new insights. Aim: To determine the accuracy of the surprise question in predicting death, differentiating by timeframe, patient subgroup and by type of healthcare professional. Design: Systematic review and meta-analysis. Data sources: Electronic databases PubMed, Embase, Cochrane Library, Scopus, Web of Science and CINAHL were searched from inception till 22nd January 2021. Studies were eligible if they used the surprise question prospectively and assessed mortality. Sensitivity, specificity, negative predictive value, positive predictive value and c-statistic were calculated. Results: Fifty-nine studies met the inclusion criteria, including 88.268 assessments. The meta-analysis resulted in an estimated sensitivity of 71.4% (95% CI [66.3–76.4]) and specificity of 74.0% (95% CI [69.3–78.6]). The negative predictive value varied from 98.0% (95% CI [97.7–98.3]) to 88.6% (95% CI [87.1–90.0]) with a mortality rate of 5% and 25% respectively. The positive predictive value varied from 12.6% (95% CI [11.0–14.2]) with a mortality rate of 5% to 47.8% (95% CI [44.2–51.3]) with a mortality rate of 25%. Seven studies provided detailed information on different healthcare professionals answering the surprise question. Conclusion: We found overall reasonable test characteristics for the surprise question. Additionally, this study showed notable differences in performance within patient subgroups. However, we did not find an indication of notable differences between timeframe and healthcare professionals.

• Seven studies provided detailed information on different healthcare professionals answering the surprise question.
Based on these studies we did not find an indication of notable differences between the accuracy of healthcare professionals answering the surprise question.

Introduction
Palliative care aims to improve quality of life and end of life care of patients with life-threatening illnesses and to support their families.Improving end of life care is challenging due to the unpredictable course of chronic diseases.In order to benefit from palliative care, the definition of palliative care by the World Health Organisation emphasises timely identification of patients. 1The surprise question was proposed by Lynn et al. 2 as a screening method to identify patients who might benefit from palliative care.It requires the healthcare professional to answer the question: 'Would I be surprised if this patient were to die in the next 12 months?' 2 (or a different timeframe other than 12 months).Two earlier meta-analyses have been performed to study the accuracy of the surprise question. 3,4Results from Downar et al. 3 showed a sensitivity of 67.0% and specificity of 80.2%.White et al. 4 showed a pooled accuracy of 74.8%.Both meta-analyses included studies with different timeframes, patient subgroups and healthcare professionals.Downar et al. included studies with a 6, 12 and 18 months timeframe but did not differentiate between timeframes in their results.White et al. included studies with timeframes of 7 days, 30 days, 6 months, 6-12 months and 12 months and stated that an increase in timeframe did not impact the diagnostic accuracy.Both meta-analyses concluded that the surprise question performs better in cancer patients compared to other subgroups.White et al. suggested that doctors appear to be more accurate than nurses in recognising people in their last year of life. 4However, the accuracy of the surprise question by type of healthcare professional is based on one study and more research is needed.
Many studies on the surprise question have been published in recent years, potentially giving new insights, not only into the overall accuracy of the surprise question, but also into potential differences between timeframes, patient subgroups and healthcare professionals answering the surprise question.Therefore, the aim of this systematic review and meta-analysis is to determine the accuracy of the surprise question in predicting death, investigating potential differences by timeframe, patient subgroup and type of healthcare professional answering the surprise question by answering the following questions: 1.How accurate is the surprise question in identifying patients in the last year of life? 2. Are there differences in accuracy of the surprise question between various timeframes?3. Are there differences between patient subgroups to identify patients in the last year of life when using the surprise question?4. Are there differences between healthcare professionals in identifying patients in the last year of life when using the surprise question?

Study design
This study entails a systematic review and meta-analysis of articles studying the accuracy of the surprise question.This study followed the reporting guideline of the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA). 5,6

Data sources and search strategy
A systematic search was performed in six databases from inception till January 22nd 2021: PubMed, Embase, Cochrane Library, Scopus, Web of Science and Cumulative Index to Nursing and Allied Health Literature (CINAHL).The search terms 'surprise question', 'Gold Standards Framework' and 'NECPAL' were combined using the Boolean operator OR.The latter two are more elaborate tools to predict the need for end of life care that also use the surprise question 7,8 and were added after an initial pilot search.No filters or limits were applied in the search.Details of the search strategy can be found in Appendix 1. Cross-referencing of included studies was performed.

Eligibility criteria
Inclusion criteria.Studies were included if they met the following criteria:

Study selection
Two reviewers (EvL and LI) independently screened all studies by title and abstract to identify potentially relevant studies.Subsequently full texts of the remaining studies were assessed by the same two reviewers.Screening of the studies was performed using Rayyan. 9isagreements were resolved by discussion until consensus was reached.In case of doubt a third reviewer was consulted (JvD).In case of non-peer reviewed publications, databases were searched for full text versions and requested by contacting the corresponding author.In case of incomplete data or if interpretation of data was unclear, the corresponding author of (potentially) relevant studies was contacted to obtain additional data or information.

Quality of studies assessment
The Quality in Prognosis Studies (QUIPS) tool

Data extraction and statistical analysis
Two reviewers (EvL and LI) independently extracted the following data: study population, type of healthcare professional answering the surprise question, study setting, total subjects, total surprise question assessments, surprise question timeframe, mean age, gender and mortality.A 'no' answer to the surprise question will be referred to as a positive answer to the surprise question, whereas a 'yes' answer will be referred to as negative answer to the surprise question.In studies where multiple healthcare professionals answered the surprise question, the study's definition was used to determine whether the answer was positive (this could require consensus in case of a multidisciplinary team or require at least one healthcare professional answering 'no').If multiple healthcare professionals answered the surprise question and the study provided data separately, the physician's response was used for the meta-analysis when possible.In studies where a third option for answering the surprise question besides 'yes' and 'no' was possible (e.g.'unsure') data extraction was performed conform the study's definition of a positive surprise question answer (e.g.'unsure' was regarded as 'No, I would not be surprised').Studies were divided in subgroups based on timeframe and patient group (cancer, cardiac disease, emergency department, kidney disease, primary care and pulmonary disease).The patient groups consisting of too few studies for analysis were combined as various.If a study cohort could potentially be classified into two groups (e.g.cardiac and emergency department), the cohort was classified into the underlying organ specific disease (e.g.cardiac disease).A '6 to 12' month timeframe was considered equivalent to a '12-month' timeframe.In case a study contained a derivation and a validation cohort, these were counted as separate cohorts.When a study investigated two different timeframes of the surprise question, both timeframes were included in the analysis.
The accuracy of the surprise question was analysed by constructing 2 × 2 tables of the surprise question response and mortality for each study.A true positive was considered as 'No, I would not be surprised' and deceased within the predetermined timeframe and a true negative was considered as 'Yes, I would be surprised' and alive.Sensitivity, specificity, negative predictive value (NPV), positive predictive value (PPV) and confidence intervals (CI's) were calculated for each study.CI's were calculated with Wilson's method. 11We considered for sensitivity a correct outcome corresponding to a positive surprise question answer ('No, I would not be surprised') patients that died during the specified timeframe, and for specificity a correct outcome corresponding to patients with a negative surprise question answer ('Yes, I would be surprised') that did not die during the specified timeframe.NPV represents the percentage of patients surviving when the healthcare professionals predicted survival and PPV represents the percentage of patients dying when healthcare professionals predicted death within the specified timeframe.A bivariate random effects logistic regression model was used to pool sensitivity and specificity. 12This model analyses the combination of sensitivity and specificity, estimates heterogeneity of sensitivity and specificity between studies and the correlation between these measures.Results from the analyses are presented as pooled sensitivity and specificity.PPV and NPV depend on prevalence of disease or mortality rate.Hence, pooled sensitivity and specificity were used to estimate pooled PPV and NPV with 95% CI for various mortality rates: 5%, 10% and 25%.From the results from this analysis, the summary c-statistic (area under the summary receiver operating characteristic curve) was estimated with formulas described by Walter. 13he corresponding standard error (SE) was estimated with the Delta method. 14The heterogeneity measure (τ 2 ), differences between studies beyond the uncertainty captured by confidence intervals, was used to estimate the I 2 statistic. 15n a second step, we assessed the impact of timeframe, patient group and peer reviewed versus non-peer reviewed studies by including these characteristics in the model.Reporting the results from the analysis with timeframe was limited to 6 and 12 months, as these were considered most relevant.We performed a likelihood ratio test to assess the influence of non-peer reviewed publications.For each subgroup we estimated pooled sensitivity, specificity, NPV, PPV and the c-statistic with CI's.For the subgroups cardiac, emergency department and pulmonary disease, the analysis showed convergence difficulties, as the correlation between sensitivity and specificity over studies was estimated close to zero.For these analyses, we removed the correlation to obtain reliable results.Statistical analysis was performed with SAS version 9.4. 16Forest plots were made using Microsoft Excel version 2016. 17ccording to Dutch law, ethics approval was not required for this study.

Study selection
The systematic search identified 1365 studies, of which 745 were duplicates.9][20] Of the remaining 623 studies, 500 articles were excluded based on title/abstract screening.Full texts were assessed of 123 studies.Based on full text, 64 articles were excluded.In total 59 studies were included in the meta-analysis. 18, The lowchart of the included studies can be found in Figure 1.Four studies consisted of multiple cohorts: three studies consisted of a derivation and a validation cohort 22,38,52 and one study consisted of two different patient subgroups. 70n total 63 cohorts were included in our analysis.Four studies used two variants of the surprise question with varying timeframes. 31,44,64,74orresponding authors of 35 potentially relevant studies were contacted in order to obtain full text or additional data in order to construct the 2 × 2 table.72,75 Of the remaining 17 articles, two studies were excluded since they did not use the surprise question to predict death. 79,8098][99][100][101][102][103]

Study characteristics
Characteristics of included studies can be found in Appendix 2. Studies were heterogeneous in timeframe, population, setting and healthcare professional answering the surprise question (e.g.nurse v medical specialist).Most studies originated in the United States (20 studies), United Kingdom (9 studies) and The Netherlands (six studies).Forty-five studies took place in the hospital.Of these, 12 studies were performed at haemodialysis units and eight in outpatient clinics.Of the remaining 14 studies, eight took place in general practice/primary care, three in hospice care settings, one in a nursing home and one in a neurorehabilitation centre.One study took place at multiple settings (three primary care centres, one general hospital, one intermediate care centre and four nursing homes). 37Most studies investigated a 12-month timeframe of the surprise question (48 cohorts).Other timeframes were 3 days, 67 1 week, 31 1 month, 31,36,43,51,56,74 3 months, 44 6 months 22,38,42,46,47,53,64 and 24 months. 57our studies used two variants of the surprise question with varying timeframes. 31,44,64,74In general, patients included were adults (>18 years), except for one study performed in children. 44Eighteen studies included patients with kidney disease, 12 patients with cancer, seven with cardiac disease, seven included a diverse group of patients in general practice/primary care, six studies included patients with pulmonary disease and five studies included patients from the emergency department.In seven studies the surprise question was answered by various healthcare professionals. 26,45,50,57,60,71,75In two studies answering the surprise question was based on consensus of a multidisciplinary team. 30,44Mortality rate of all studies was on average 11.85% and varied between studies from 0.99% (primary care) 76 to 78.78% (advanced cancer patients at the emergency department). 63n total five of the included studies added a third option for answering the surprise question besides 'yes' and 'no', including 'don't know this patient well enough', 26 'don't know', 66 'unsure', 48 'uncertain' 49 and 'defer'. 71In total these answers represent 61 of 88.268 surprise question assessments, varying from 6% 48 to 9% 71 per study.In two studies this percentage could not be retrieved. 26,49ality assessment: Risk of bias A detailed overview of the risk of bias assessment is presented in Appendix 3. Three studies had a high risk of bias (two non-peer reviewed), 13 studies (eight non-peer reviewed) had a moderate risk of bias and 43 studies (six non-peer reviewed) had a low risk of bias.Most methodological issues were in study population (domain 1: eight high and 30 intermediate risk of bias) and study confounding (domain 5: two high and 17 intermediate risk of bias).A risk of selection bias was in many studies caused by not specifying the eligible population.An intermediate or high-risk assessment in study confounding was in most studies due to the setting and patient population (e.g.haemodialysis patients) or caused by planning an intervention based on the outcome of the surprise question.
Results from the subgroup analysis including timeframe subgroups (6-and 12-months), patient subgroups and peer reviewed versus non-peer reviewed subgroups can be found in Table 1  In seven studies multiple healthcare professionals answered the surprise question.Due to the heterogeneity of the results (different patient subgroups, different healthcare professionals answering the surprise question with different seniority and different intensity in care provision to the patient) we could not perform a meta-analysis on this subgroup.An overview of the accuracy of the surprise question by different healthcare professionals can be found in Table 2.The study by Da Silva Gane et al. 26 investigated the variability between nephrologists and nurses of different levels of seniority (referred to as 'bands').They conclude that nephrologists perform better compared to nurses based on a higher sensitivity and similar specificity.The study of Lakin et al. 57 also show that primary care physicians have a higher sensitivity compared to nurse care coordinators.On the contrary, the results of Valerio and Farinha 75 show that nurses have a higher sensitivity and lower specificity compared to nephrologists and the results of Straw et al. 60 show that heart failure nurses have a higher sensitivity compared to cardiologists, trainee-grade doctors and non-specialist nurses.Similar performances between healthcare professionals are seen in the study by Mudge et al. 50when comparing doctors and senior nurses and by Rauh et al. 71 when comparing doctors, nurses and advanced practice providers.Ebke et al. 45 compare the accuracy of answering the surprise question by neurorehabilitation physicians and palliative care physicians, with palliative care physicians having a higher sensitivity and lower specificity.In five other studies multiple healthcare professionals answered the surprise question, however, no separate data was reported. 40,52,55,68,74
Based on these studies we did not find clear evidence for a difference between the accuracy of healthcare professionals answering the surprise question.

Strengths and limitations
This study has a number of strengths.First of all, each part of the review process was independently undertaken by two reviewers.Furthermore, a high number of studies have been included.This can be explained by (1) the increased attention for palliative care and the surprise question, resulting in a high amount of recently published studies (2) the effort made to obtain additional data by contacting authors and (3) including non-peer reviewed studies: 16 of the 59 included studies were non-peer reviewed studies, mostly conference abstracts.We also included the non-peer reviewed studies in an effort to avoid publication bias of favourable outcomes. 104A limitation of including non-peer reviewed studies is that they did not provide sufficient information for a comprehensive quality assessment, which could have led to a relatively negative quality assessment.Furthermore, we observed a high degree of heterogeneity, with an overall I 2 of 98.2% and 98.4% for sensitivity and specificity respectively.The analysis with subgroups (i.e.timeframe, patient subgroups and type of publication) still showed a high degree of heterogeneity.This can be explained by the enormous diversity in included studies, reflecting the different real-life circumstances in which the surprise question is used, and its versatile nature.Furthermore, the accuracy of the surprise question may be overestimated due to a possible self-fulling prophecy: a positive answer to the surprise question ('No, I would not be surprised') could lead to, consciously or subconsciously, discussing goals of care, thereby potentially influencing outcome.Finally, c-statistics were estimated with an easy to apply formula, which may result in a slight over-estimation. 13mparison to other literature As described earlier, two meta-analyses were performed on the accuracy of the surprise question by Downar et al. 3 and White et al. 4 Despite this, the subjectiveness and accuracy of using the surprise question are still debated. 105,106The previous meta-analyses included    surprise question in title or abstract.Furthermore, both meta-analyses report a substantial risk of bias in their included studies.Indeed, in our assessment, most pre-2017 studies have an increased risk of bias whereas more recent studies seem to be of better methodological quality.Hence, our results may be more reliable due to the increase of surprise question assessments included and improved methodological quality of included studies.This study shows similar results in overall accuracy in predicting death compared to the previous metaanalyses.Downar et al. reported a sensitivity of 67.0% and a specificity of 80.2% compared to 71.4% and 74.0%respectively in our study.The c-statistic (area under the curve) of Downar et al. 3 was 0.81 [0.78-0.84]compared to 0.79 [0.77-0.81] in our meta-analysis.De Bock et al. 107 studied the accuracy of the Supportive and Palliative Care Indicators Tool (SPICT) in a geriatric population and report a higher sensitivity of 84.1% and a lower specificity of 57.9% compared to our results of the surprise question.
White et al. stated that an increase in timeframe did not impact the diagnostic accuracy.Our study showed similar sensitivity for 6-and 12-month timeframe.However we found a lower specificity for 6-month timeframe compared to a 12-month timeframe.Our study confirms the previous conclusions that the surprise question performs better in cancer patients compared to other subgroups.We did not find clear evidence for a difference between the accuracy of healthcare professionals answering the surprise question, in contrast to an earlier suggestion by White et al. 4 that doctors seem to be more accurate than nurses in recognising people in the last year of life.

Implications for practice
A systematic review by Cardona-Morrell et al. 108 indicated that on average 33%-38% of patients nearing their end of life receive non-beneficial treatments in the last 6 months of their life.Advance care planning can have a positive effect on end of life care, decrease life-sustaining treatment, increase use of hospice and palliative care, prevent hospital admissions and improve goal-concordant care. 109Timely identification of patients who could potentially benefit from advance care planning is important. 110The importance of advance care planning increases when nearing the end of life.Hence, prognostication of mortality can be used as a proxy for initiating advance care planning.The surprise question is an easy to use tool 2 and does not require large amounts of clinical data compared to other available screening tools. 111hese characteristics and the reasonable accuracy in predicting death with fairly high NPV with various mortality rates make the surprise question an appropriate screening tool for initiating advance care planning.Additionally, patients with a positive answer to the surprise question ('No, I would not be surprised') are likely to be vulnerable and may therefore benefit from advance care planning regardless of whether they die exactly within the specified timeframe.Furthermore, initiating advance care planning 'too early' does not seem to cause damage. 109The results of this systematic review and meta-analysis encourage the use of the surprise question as screening tool by various healthcare professionals, not exclusively by doctors.We think the surprise question should not solely be seen as an indicator of prognostication of death but rather as an opportunity for renewed attention for quality of care and shared decision making by timely initiating advance care planning.

Conclusion
We found overall reasonable test characteristics for the surprise question.Additionally, this study showed notable differences in performance within patient subgroups.However, we did not find an indication of notable differences between timeframe and healthcare professionals.We submit that the surprise question is an appropriate tool for initiating advance care planning.

Authorship
EvL, LI and JvD conceived and designed the study.EvL and LI collected the data, critically appraised the articles and drafted the manuscript.NZ performed the statistical analyses.All authors critically revised the manuscript and approved the final version to be published.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Table 1 .
Diagnostic accuracy of the surprise question.
ED: emergency department; AUC: area under the curve; I 2 : Heterogeneity; CI: confidence interval; PPV: positive predictive value; NPV: negative predictive value.a 2/12 cohorts were analysed with two separate timeframes.b 1/6 cohort was analysed with two separate timeframes.c 1/12 cohort was analysed with two separate timeframes.d 3/47 cohorts were analysed with two separate timeframes.e 1/16 cohort was analysed with two separate timeframes.
17 and 22 cohorts, with 11.621 and 25.718 surprise question assessments respectively, compared to 63 cohorts and 88.268 SQ assessments in this study.Moreover, Downar et al. did not include 'Gold Standards Framework' in the search, therefore missing studies that did not mention the

Table 2 .
Accuracy of the surprise question by type of healthcare professional.Band5 nurses are less senior nurses.Band 6 nurses are of intermediate seniority and band 7/8 are senior nurses.**CI's are only provided when presented in the original study.in hospitalized older patients: a prospective cohort study.Int J Nurs Stud 2020; 109: 103609.67.Ikari T, Hiratsuka Y, Yamaguchi T, et al. '3-Day Surprise Question' to predict prognosis of advanced cancer patients with impending death: multicenter prospective observational study.Cancer Med 2021; 10: 1018-1026.68.Estifan Kasabji J, Lucas C, Sastre A, et al.Is the surprise question useful as a predictor of mortality in hemodialysis patients?Nephrol Dial Transplant 2020; 35: iii1452.69.Lai C-F, Cheng C-I, Chang C-H, et al.Integrating the surprise question, palliative care screening tool, and clinical risk models to identify peritoneal dialysis patients with high one-year mortality.J Pain Symptom Manag 2020; 60(3): 613-621.e6.70.Maes H, Van Den Noortgate N, De Brauwer I, et al.Prognostic value of the surprise question for one-year mortality in older patients: a prospective multicenter study in acute geriatric and cardiology units.Acta Clin Belg 2022; 77: 286-294.71.Rauh LA, Sullivan MW, Camacho F, et al.Validation of the surprise question in gynecologic oncology: a one-question screen to promote palliative care integration and advance care planning.Gynecol Oncol 2020; 157: 754-758.72.Tabernero Huguet E, Ortiz de Urbina Antia B, González Quero B, et al.Prevalence and mortality of patients with palliative needs in an acute respiratory setting.Arch Bronconeumol 2021; 57: 729.73.Tak N, Moor C, Owusuaa C, et al.The value of the surprise question to predict mortality in idiopathic pulmonary fibrosis.Eur Respir J 2020; 56: 1800.74.Tripp D, Janis J, Jarrett B, et al.How well does the surprise question predict 1-year mortality for patients admitted with COPD?J Gen Intern Med 2021; 36: 2656-2662.75.Valerio P and Farinha A. Surprise question: a mortality predictor in hemodialysis patients?J Am Soc Nephrol 2020; 31: 407.76. van Wijmen MPS, Schweitzer BPM, Pasman HR, et al.Identifying patients who could benefit from palliative care by making use of the general practice information system: the surprise question versus the SPICT.Fam Pract 2020; 37: 641-647.77.Yen Y-F, Lee Y-L, Hu HY, et al.Early palliative care: the surprise question and the palliative care screening tool-better together.BMJ Support Palliat Care.Epub ahead of print 25 May 2020.DOI: 10.1136/bmjspcare-2019-002116.78.Ermers DJ, Kuip EJ, Veldhoven C, et al.Timely identification of patients in need of palliative care using the double surprise question: a prospective study on outpatients with cancer.Palliat Med 2021; 35: 592-602.79.Milnes S, Orford NR, Berkeley L, et al.A prospective observational study of prevalence and outcomes of patients with gold standard framework criteria in a tertiary regional Australian Hospital.BMJ Support Palliat Care 2019; 9: 92-99.80. Ramsenthaler C, Haberland B, Schneider S, et al.Identifying patients with cancer appropriate for early referral to palliative care using the integrated palliative care outcome scale (IPOS)-a cross-sectional study of acceptability and deriving valid cut-points for screening.Palliat Med 2018; 32: 98.